On Enhancing the Label Propagation Algorithm for Sentiment Analysis Using Active Learning with an Artificial Oracle

نویسندگان

  • Anis Yazidi
  • Hugo Hammer
  • Aleksander Bai
  • Paal E. Engelstad
چکیده

A core component of Sentiment Analysis is the generation of sentiment lists. Label propagation is equivocally one of the most used approaches for generating sentiment lists based on annotated seed words in a manual manner. Words which are situated many hops away from the seed words tend to get low sentiment values. Such inherent property of the Label Propagation algorithm poses a controversial challenge in sentiment analysis. In this paper, we propose an iterative approach based on the theory of Active Learning [1] that attempts to remedy to this problem without any need for additional manual labeling. Our algorithm is bootstrapped with a limited amount of seeds. Then, at each iteration, a fixed number of “informative words” are selected as new seeds for labeling according to different criteria that we will elucidate in the paper. Subsequently, the Label Propagation is retrained in the next iteration with the additional labeled seeds. A major contribution of this article is that, unlike the theory of Active Learning that prompts the user for additional labeling, we generate the additional seeds with an Artificial Oracle. This is radically different from the main stream of Active Learning Theory that resorts to a human (user) as oracle for labeling those additional seeds. Consequently, we relieve the user from the cumbersome task of manual annotation while still achieving a high performance. The lexicons were evaluated by classifying product and movie reviews. Most of the generated sentiment lexicons using Active learning perform better than the Label Propagation algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Interesting Aspects of a Product using Aspect-based Opinion Mining from Product Reviews (RESEARCH NOTE)

As the internet and its applications are growing, E-commerce has become one of its rapid applications. Customers of E-commerce were provided with the opportunity to express their opinion about the product on the web as a text in the form of reviews. In the previous studies, mere founding sentiment from reviews was not helpful to get the exact opinion of the review. In this paper, we have used A...

متن کامل

Community Detection using a New Node Scoring and Synchronous Label Updating of Boundary Nodes in Social Networks

Community structure is vital to discover the important structures and potential property of complex networks. In recent years, the increasing quality of local community detection approaches has become a hot spot in the study of complex network due to the advantages of linear time complexity and applicable for large-scale networks. However, there are many shortcomings in these methods such as in...

متن کامل

MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection

Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...

متن کامل

Classification of encrypted traffic for applications based on statistical features

Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015